{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a6b93b38-493c-4b9b-a6df-82ef055cfe00",
   "metadata": {},
   "source": [
    "# Dynascoring"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "46c53791-fe0c-4831-ad16-0678cf171adc",
   "metadata": {},
   "outputs": [],
   "source": [
    "__author__ = \"Christopher Potts\"\n",
    "__version__ = \"CS224u, Stanford, Spring 2023\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75adfeb6-f95c-4da5-afe5-2fd1c860273f",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "1. [Overview](#Overview)\n",
    "1. [Set-up](#Set-up)\n",
    "1. [Scoring function](#Scoring-function)\n",
    "1. [Example](#Example)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d442dc27-ab38-473e-836f-e07019116208",
   "metadata": {},
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21cf99da-b444-4f00-8e8f-3ac9d6164b1c",
   "metadata": {},
   "source": [
    "This notebook provides an implementation of the dynascoring method of [Ma et al 2021](https://papers.nips.cc/paper/2021/hash/55b1927fdafef39c48e5b73b5d61ea60-Abstract.html). Dynascores allow you to synthesize multiple metrics into a single score, with weights on the metrics expressing your assessment of the relative importance of the metrics. The notebook implements the function and then illustrates with an example from the paper."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48d08744-5272-48ef-ab68-22eb4e9fc005",
   "metadata": {},
   "source": [
    "## Set-up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "fcfcac4e-d9fb-4283-ac48-22b7e658c167",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce62ca2a-78f6-4ef3-b65b-70fee4f0d20b",
   "metadata": {},
   "source": [
    "## Scoring function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a57ec549-26e0-4eed-8b0a-0434a4afd987",
   "metadata": {},
   "outputs": [],
   "source": [
    "def dynascore(\n",
    "        data,\n",
    "        weights,\n",
    "        perf_metric_field_name,\n",
    "        direction_multipliers,\n",
    "        offsets,\n",
    "        delta_cutoff_proportion=0.0001):\n",
    "    \"\"\"Implementation of dynacoring.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    data: `pd.DataFrame`\n",
    "        Column names must include at least the keys of `weights`.\n",
    "    weights: `dict`\n",
    "        Map from metric names to their weights for dynascoring.\n",
    "    perf_metric_field_name: `str`\n",
    "        The metric in `weights` to use for performance.\n",
    "    direction_multipliers: `pd.Series` or `None`\n",
    "        If not `None`, then this should have the same structure as\n",
    "        `weights` but with values 1 for no change in direction and `-1`\n",
    "        to change direction for the metric.\n",
    "    offsets: `pd.Series` or `None`\n",
    "        If not `None`, then this should have the same structure as\n",
    "        `weights` and provide an adjustment of some kind for the\n",
    "        values in `weights`.\n",
    "    delta_cutoff_proportion: float\n",
    "        Default of 0.0001. This value controls the smallest scoring\n",
    "        distinction that we retain.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    pd.DataFrame containing the adjusted scores and a new column\n",
    "    `Dynascore`.\n",
    "\n",
    "    \"\"\"\n",
    "    converted_data = data.copy(deep=True)\n",
    "    converted_data.sort_values(perf_metric_field_name, inplace=True)\n",
    "\n",
    "    metrics = weights.index\n",
    "\n",
    "    # Convert the data:\n",
    "    for metric in metrics:\n",
    "        if direction_multipliers is not None:\n",
    "            converted_data[metric] *= direction_multipliers[metric]\n",
    "        if offsets is not None:\n",
    "            converted_data[metric] += offsets[metric]\n",
    "\n",
    "    converted_data[\"Dynascore\"] = 0\n",
    "\n",
    "    # Normalized the weights:\n",
    "    weights = weights / weights.sum()\n",
    "\n",
    "    # We don't want small denominators to make AMRS super sensitive to\n",
    "    # noise in the model submissions.\n",
    "    delta = converted_data.diff()\n",
    "    delta_threshold = (\n",
    "        converted_data[perf_metric_field_name].max() * delta_cutoff_proportion\n",
    "    )\n",
    "    satisfied_indices = []\n",
    "    for index in range(len(delta[perf_metric_field_name])):\n",
    "        if abs(delta[perf_metric_field_name][index]) > delta_threshold:\n",
    "            satisfied_indices.append(index)\n",
    "\n",
    "    for metric in metrics:\n",
    "        AMRS = (\n",
    "            delta[metric][satisfied_indices].abs() / delta[perf_metric_field_name][satisfied_indices]\n",
    "        ).mean(skipna=True)\n",
    "        converted_data[metric] = converted_data[metric] / abs(AMRS)\n",
    "        converted_data[\"Dynascore\"] += converted_data[metric] * weights.get(\n",
    "            metric, 0\n",
    "        )\n",
    "\n",
    "    return converted_data.sort_values(\"Dynascore\", ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7efcd93f-cfd9-4751-b8bb-9ad20224ffdf",
   "metadata": {},
   "source": [
    "## Example"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "941e32aa-9f27-4289-a48c-523ccf974117",
   "metadata": {},
   "source": [
    "These numbers are from [Ma et al. 2021](https://papers.nips.cc/paper/2021/hash/55b1927fdafef39c48e5b73b5d61ea60-Abstract.html), Table 1, top (NLI example). The output scores are somewhat different from the paper, I assume because the paper's scoring was done based on the unrounded values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e8878793-aad3-4755-8dec-81d2223464d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = {\n",
    "    \"Model\": [\"DeBERTa\", \"RoBERTa\", \"ALBERT\", \"T5\", \"BERT\", \"Majority Baseline\", \"FastText\"],\n",
    "    \"Perf\": [69.54, 69.07, 67.29, 67.16, 64.82, 32.41, 31.29],\n",
    "    \"Throughput\": [7.41, 9.23, 9.60, 7.10, 9.39, 77.33, 73.94],\n",
    "    \"Memory\": [5.71, 4.82, 2.18, 10.62, 4.13, 1.15, 2.20],\n",
    "    \"Fairness\": [91.97, 90.94, 89.94, 91.89, 92.11, 100.00, 83.23],\n",
    "    \"Robustness\": [75.70, 74.82, 74.12, 73.47, 66.38, 100.00, 69.14]\n",
    "}\n",
    "\n",
    "data = pd.DataFrame(data).set_index(\"Model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41b34cba-041d-4ccc-9c05-be1def915657",
   "metadata": {},
   "source": [
    "Here's a look at a full set-up; the required pieces are just `weights` and `perf_metric_field_name`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2c8f1781-cd7a-4d44-beec-7971508a4c98",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The implementation normalizes the weights:\n",
    "weights = pd.Series({\n",
    "    \"Perf\": 4,\n",
    "    \"Throughput\": 1,\n",
    "    \"Memory\": 1,\n",
    "    \"Fairness\": 1,\n",
    "    \"Robustness\": 1})\n",
    "\n",
    "perf_metric_field_name = \"Perf\"\n",
    "\n",
    "# All our metrics are ones we want to increase, so these values\n",
    "# are all 1. We could use -1 to reverse direction.\n",
    "direction_multipliers = pd.Series(\n",
    "    {'Perf': 1,\n",
    "     'Throughput': 1,\n",
    "     'Memory': 1,\n",
    "     \"Fairness\": 1,\n",
    "     \"Robustness\": 1})\n",
    "\n",
    "offsets = pd.Series(\n",
    "    {'Perf': 0,\n",
    "     'Throughput': 0,\n",
    "     'Memory': 16, # (16GB - memory used), as in the paper\n",
    "     \"Fairness\": 0,\n",
    "     \"Robustness\": 0})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2138d4bf-0414-420d-ae03-44bf8378e72e",
   "metadata": {},
   "source": [
    "Example run:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "daeb818e-0f04-4b53-b294-67497a8cc644",
   "metadata": {},
   "outputs": [],
   "source": [
    "scored = dynascore(\n",
    "    data,\n",
    "    weights,\n",
    "    perf_metric_field_name=perf_metric_field_name,\n",
    "    direction_multipliers=direction_multipliers,\n",
    "    offsets=offsets,\n",
    "    delta_cutoff_proportion=0.0001)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "b8f56fbb-e411-4252-bfd6-2de22015f89a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Perf</th>\n",
       "      <th>Throughput</th>\n",
       "      <th>Memory</th>\n",
       "      <th>Fairness</th>\n",
       "      <th>Robustness</th>\n",
       "      <th>Dynascore</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Model</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>DeBERTa</th>\n",
       "      <td>69.54</td>\n",
       "      <td>1.511594</td>\n",
       "      <td>1.806587</td>\n",
       "      <td>16.689470</td>\n",
       "      <td>11.680170</td>\n",
       "      <td>38.730978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>RoBERTa</th>\n",
       "      <td>69.07</td>\n",
       "      <td>1.882863</td>\n",
       "      <td>1.732527</td>\n",
       "      <td>16.502560</td>\n",
       "      <td>11.544390</td>\n",
       "      <td>38.492792</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ALBERT</th>\n",
       "      <td>67.29</td>\n",
       "      <td>1.958340</td>\n",
       "      <td>1.512840</td>\n",
       "      <td>16.321093</td>\n",
       "      <td>11.436383</td>\n",
       "      <td>37.548582</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>T5</th>\n",
       "      <td>67.16</td>\n",
       "      <td>1.448356</td>\n",
       "      <td>2.215171</td>\n",
       "      <td>16.674953</td>\n",
       "      <td>11.336091</td>\n",
       "      <td>37.539321</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BERT</th>\n",
       "      <td>64.82</td>\n",
       "      <td>1.915502</td>\n",
       "      <td>1.675109</td>\n",
       "      <td>16.714875</td>\n",
       "      <td>10.242136</td>\n",
       "      <td>36.228453</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Majority Baseline</th>\n",
       "      <td>32.41</td>\n",
       "      <td>15.774840</td>\n",
       "      <td>1.427129</td>\n",
       "      <td>18.146646</td>\n",
       "      <td>15.429551</td>\n",
       "      <td>22.552271</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FastText</th>\n",
       "      <td>31.29</td>\n",
       "      <td>15.083301</td>\n",
       "      <td>1.514504</td>\n",
       "      <td>15.103453</td>\n",
       "      <td>10.667992</td>\n",
       "      <td>20.941156</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Perf  Throughput    Memory   Fairness  Robustness  \\\n",
       "Model                                                                   \n",
       "DeBERTa            69.54    1.511594  1.806587  16.689470   11.680170   \n",
       "RoBERTa            69.07    1.882863  1.732527  16.502560   11.544390   \n",
       "ALBERT             67.29    1.958340  1.512840  16.321093   11.436383   \n",
       "T5                 67.16    1.448356  2.215171  16.674953   11.336091   \n",
       "BERT               64.82    1.915502  1.675109  16.714875   10.242136   \n",
       "Majority Baseline  32.41   15.774840  1.427129  18.146646   15.429551   \n",
       "FastText           31.29   15.083301  1.514504  15.103453   10.667992   \n",
       "\n",
       "                   Dynascore  \n",
       "Model                         \n",
       "DeBERTa            38.730978  \n",
       "RoBERTa            38.492792  \n",
       "ALBERT             37.548582  \n",
       "T5                 37.539321  \n",
       "BERT               36.228453  \n",
       "Majority Baseline  22.552271  \n",
       "FastText           20.941156  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scored"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
